Speech/Nonspeech Segmentation in Web Videos

نویسنده

Ananya Misra

چکیده

Speech transcription of web videos requires first detecting segments with transcribable speech. We refer to this as segmentation. Commonly used segmentation techniques are inadequate for domains such as YouTube, where videos may have a large variety of background and recording conditions. In this work, we investigate alternative audio features and a discriminative classifier, which together yield a lower frame error rate (25.3%) on YouTube videos compared to the commonly used Gaussian mixture models trained on cepstral features (30.6%). The alternative audio features perform particularly well in noisy conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Do infants detect a-v articulator congruency for non-native click consonants?

In a prior study infants habituated to an audio-only labial or alveolar, native English voiceless or non-native ejective stop, then saw silent videos of stops at each place [1]. 4-month-olds gazed more at congruent videos for native and non-native stops. 11-month-olds preferred congruence for native stops but incongruence for non-native ejectives, suggesting language experience biases but does ...

متن کامل

Fast lip tracking for speech/nonspeech detection

spoken language systems Saarland university 66041 Saarbrücken An efficient speech/nonspeech detection is an important part of any speech recognition system. It allows a good estimation of the background noise, which can be used for noise cancellation techniques like spectral subtraction. Furthermore it avoids the activity of the speech recognizer on unwanted segments of the audio stream. Recent...

متن کامل

Segregation of unvoiced speech from nonspeech interference.

Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating...

متن کامل

Audio Segmentation using Line Spectral Pairs

This paper describes a technique for unsupervised audio segmentation. Main objective of the work presented in this paper is to study the performance of audio segmentation system using metric-based method. The system first classifies the audio signal into speech and nonspeech signal using variance of zero crossing rate. The feature Line spectral pair is used for automatically detecting the speak...

متن کامل

Content analysis for audio classification and segmentation

In this paper, we present our study of audio content analysis for classification and segmentation, in which an audio stream is segmented according to audio type or speaker identity. We propose a robust approach that is capable of classifying and segmenting an audio stream into speech, music, environment sound, and silence. Audio classification is processed in two steps, which makes it suitable ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Speech/Nonspeech Segmentation in Web Videos

نویسنده

چکیده

منابع مشابه

Do infants detect a-v articulator congruency for non-native click consonants?

Fast lip tracking for speech/nonspeech detection

Segregation of unvoiced speech from nonspeech interference.

Audio Segmentation using Line Spectral Pairs

Content analysis for audio classification and segmentation

عنوان ژورنال:

اشتراک گذاری